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Abstract 

Background: The increasing availability of HIV-host interaction datasets, including both physical and genetic 
interactions, has created a need for software tools to integrate and visualize the data. Because these host-pathogen 
interactions are extensive and interactions between human proteins are found within many different databases, it 
is difficult to generate integrated HIV-human interaction networks. 

Results: We have developed a web-based platform, termed GPS-Prot http://www.gpsprot.org, that allows for facile 
integration of different HIV interaction data types as well as inclusion of interactions between human proteins 
derived from publicly-available databases, including MINT, BioGRID and HPRD. The software has the ability to group 
proteins into functional modules or protein complexes, generating more intuitive network representations and also 
allows for the uploading of user-generated data. 

Conclusions: GPS-Prot is a software tool that allows users to easily create comprehensive and integrated HIV-host 
networks. A major advantage of this platform compared to other visualization tools is its web-based format, which 
requires no software installation or data downloads. GPS-Prot allows novice users to quickly generate networks that 
combine both genetic and protein-protein interactions between HIV and its human host into a single 
representation. Ultimately, the platform is extendable to other host-pathogen systems. 



Background 

The application of high-throughput, unbiased, "systems" 
approaches to study host-pathogen relationships is facili- 
tating a shift in focus from the pathogen to the response 
of the host during infection. A more global view of the 
physical, genetic and functional interactions that occur 
during infection will provide a deeper insight into the 
regulatory mechanisms involved in pathogenesis and 
may eventually lead to new cellular targets for therapeu- 
tic intervention. 

Currently, the vast majority of host-pathogen physical 
interaction data involves HIV, for which a large amount 
of physical binding information has historically been 
available, mostly from small-scale, hypothesis-driven 
experiments [1]. For example, the HIV-1 Human 
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Protein Interaction Database (HHPID) maintained by 
NIAID contains over 2500 functional connections 
between individual and human proteins observed over 
25 years of research, approximately 30% of which are 
classified as physical binding interactions [2]. Another 
database, VirusMINT [3] , contains a collection of litera- 
ture-curated physical interactions for several viruses, the 
vast majority corresponding to HIV-1. 

Several large-scale, systematic studies using the yeast 
two-hybrid methodology have recently been performed 
for several important human pathogens, including hepa- 
titis C [4], Epstein-Barr [5], and influenza [6] viruses. 
Other approaches, such as those using Protein-fragment 
Complementation Assays (PCA) [7], protein arrays [8], 
or affinity tagging/purification combined with mass 
spectrometry (AP-MS) [9], which have been successfully 
used in other systems [10-13], have not been exploited 
to systematically interrogate host-pathogen physical rela- 
tionships. We have, however, recently carried out the 
first systematic host-pathogen AP-MS study targeting 
HIV-1 using two different cell lines (HEK293 and 
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Jurkat) (Jager et al., submitted), which will further 
increase the need for tools to visualize and integrate 
host-pathogen interaction datasets. 

In addition to physical interaction studies, functionally 
important factors in HIV biology have also been identi- 
fied by genetic or proteomic profiling screens. These 
studies do not necessarily identify physical binding part- 
ners for pathogenic proteins, but rather often implicate 
pathways or indirect "functional" associations. In 2008, 
three separate siRNA screens were published (Brass, 
Konig, and Zhou datasets) [14-16] that identified host 
genes required for efficient HIV infection. More 
recently, an additional RNAi screen was carried out 
using shRNAs in a potentially more physiologically rele- 
vant Jurkat cell line (Yeung dataset) [17]. RNAi studies 
in mammalian cells are also giving new insights into the 
host response to a number of other pathogenic organ- 
isms, including hepatitis C [18,19], influenza [20-23], 
West Nile [24], and Dengue fever viruses [25]. 

Similarly, several mass spectrometry-based studies 
examined protein expression levels in HIV-infected and 
uninfected cells. For example, Speijer and colleagues 
[26] used a 2D-DIGE approach in the human T-cell line 
PM1 where protein expression was measured following 
HIV infection. Another study examined protein abun- 
dance changes in a CD4 cell line 36 hours post-infection 
[27], whereas the most recent study reports on global 
protein level changes in primary CD4 cells isolated from 
five donors [28], profiling proteomic changes post infec- 
tion in a time-dependent fashion. 

At the most basic level, there exist two different types 
of data (physical vs. functional) and they both provide 
different insights into molecular mechanism. For exam- 
ple, genetic and proteomic profiling screens probing 
HIV-human interactions provide a wealth of data on 



genes and processes that contribute to pathogenesis but 
do not necessarily reflect direct physical connections. 
Conversely, methodologies that probe for physical inter- 
actions often miss crucial functional connections. There- 
fore, poor overlap is often seen when comparing 
datasets derived from these different, but complemen- 
tary platforms. However, even a comparison of datasets 
collected using the same technology can reveal a very 
low overlap. For example, although the initial HIV 
RNAi screens each identified approximately 300 genes 
[14-16], there was a small (albeit statistically significant) 
overlap of three factors [29,30]. Several reasons contri- 
bute to this lack of concordance, including differences 
in the cell types (e.g., HeLa vs. HEK293T), the RNAi 
approaches and libraries used, as well as the phenotypic 
effects that were monitored. A comparison of all four 
genetic screens, which includes the most recent dataset 
derived from Jurkat cells using an shRNA library [17], 
finds no common factor between them (Figure 1A). In 
fact, only seven of 252 genes in this dataset are shared 
with even one of the other genetic screens (p = 0.654). 
Similarly, proteomic profiling datasets shared a low 
number of proteins (three) among all three datasets, 
although this is still statistically significant (p < 10" 5 , Fig- 
ure IB). 

In cases where multiple types of data are available, it 
has been extremely illuminating to combine the diverse 
datasets to identify common pathways, processes, and 
complexes. For example, one recent study combined 
genetic and physical interaction data to identify new 
regulators of Wnt/p-Catenin signaling in mammalian 
cells [31]. Another study carried out a meta-analysis of 
several host-HIV-1 datasets, integrated with host pro- 
tein-protein interaction databases, and reported signifi- 
cant overrepresented clusters within a network of host- 
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Figure 1 Numerous host factors have been identified for HIV by small-scale and high-throughput experiments, with little overlap 
between the various sources. (A) Venn diagram shows overlap from four HIV-based genetic screens [14-17]. Only three intersections show a 
significantly higher number of shared genes than expected, which are highlighted in large type. Ten genes are shared between the Brass and 
Konig datasets (p = 0.01), 1 1 between Brass and Zhou datasets (p = 0.0014), and three between Brass, Konig, and Zhou datasets (p = 5 x 10~ 5 ). 
None are shared between all four datasets. (B) Venn diagram shows a similar analysis for three HIV-dependent proteomic profiling screens 
[26-28]. Large type highlights statistically significant overlaps between the datasets (below 1 x 10~ 4 ). 
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pathogen and host-host interactions as important func- 
tional modules involved in virulence [29]. Another 
recent study identified key processes and host cellular 
subsystems impacted by HIV-1 infection by analyzing 
patterns of interactions in the HHPID, in combination 
with functional annotation and cross-referencing to glo- 
bal siRNA data [32]. 

In order to facilitate integration and exploration of the 
vast number of HIV-human interactions from different 
databases and data types, we have created a tool, termed 
GPS-Prot, with access to all major HIV-1 and human 
interaction databases as well as an option to overlay 
functional data (e.g. genetic interactions), which requires 
only very basic user input to produce an integrated net- 
work. To our knowledge this is the first tool to combine 
comprehensive HIV-1 and human physical/functional 
interaction data with a graphical viewer and web inter- 
face. Users can thus apply the GPS-Prot platform as a 
"global positioning system" to visualize any human-HIV- 
1 interaction in the context of its landscape of reported 
binding partners. We have also implemented a feature 
for users to securely upload and view their own datasets 
of interest. This software uses a unique graphical inter- 
face based on TouchGraph LLC's Navigator program, 
which has been used for social networking applications 
and which makes navigating and gathering information 
from large networks intuitive and rapid. We therefore 
suggest that GPS-Prot is ideal for a novice user to 
quickly and easily build human-HIV-1 interaction net- 
works from the wealth of published information, or 
from a user's own dataset, and to expand the network 
around a particular protein of interest. 

Implementation 

Analysis of overlapping genes/proteins 

Gene lists were obtained from four genetic screens 
[14-17] and three proteomic profiling studies [26-28] 
and converted to NCBI Entrez gene identifiers. A list of 
published and converted identifiers for all screens can 
be found in Additional file 1 (see Additional file 1: iden- 
tifiers.xls). Statistical significance of gene/protein over- 
laps was calculated using frequency of overlap in size- 
matched, randomly generated datasets. 

Development of GPS-Prot 

GPS-Prot is hosted on an Apache 2.0 web server and 
data retrieved from external databases resides in a 
MySQL relational database. Identifiers are mapped to 
Entrez GenelDs. The logic tier is handled by PHP5 and 
the output of each database search is an XML file 
describing (1) individual proteins and (2) binary interac- 
tions. This file is passed to the network viewer, a version 
of TouchGraph Navigator (java applet) that is custo- 
mized for our application. A spring-embedded layout is 



created within Navigator to view and navigate through 
the network, along with data tables containing informa- 
tion about the proteins and interactions. The Navigator 
applet performs well with up to 100,000 nodes and 
200,000 edges, which is larger than any network that 
typical users will encounter. A connection to the server 
can be established within the applet allowing subsequent 
searches to be carried out by double-clicking on pro- 
teins in the network with the new interactions being 
added to the existing network. 

Human PPIs are taken from six publicly available 
human interaction databases (downloaded June 2011; to 
be updated quarterly): HPRD [33] (Release 8), IntAct 
[34], MINT [35], BioGRID [36], DIP [37], and MIPS 
[38]. VirusMINT [3] (downloaded June 2011, to be 
updated quarterly) is used as the default HIV-human 
interaction database in GPS-Prot. Each interaction is 
linked to PubMed identifiers (PMID) and experimental 
descriptors and all protein identifiers are converted to 
Entrez gene nomenclature to facilitate identification of 
duplicate entries, which are consolidated for scoring 
purposes. The seven functional screens discussed here 
are also searched by default (1763 factors). 

Additional optional databases currently include HIV- 
BIND (a subset of BIND containing HIV-human inter- 
actions) [39], the NIAID HIV-1 Human Database 
(HHPID) [40] from which many of the interactions in 
VirusMINT are derived, CORUM [41], and a published 
set of predicted HIV-human interactions (3372 interac- 
tions) [42]. 

To simplify searching and viewing, we do not separate 
viral proteins according to strains. All interactions 
imported from the various databases are mapped to the 
representative virus protein name. 

To facilitate visualization of large networks, each phy- 
sical interaction in the network is assigned a score. A 
high score indicates that an interaction has been 
reported in several independent publications, or perhaps 
only once, but with a high-confidence experimental 
technique (e.g. NMR or x-ray crystallography). The 
method is a modification of that used by the MINT 
database [35], which has been adapted for use across 
multiple databases, where curation standards and 
reported details of experiments vary (see Additional file 
2; Additional_methods.doc). The optional database of 
CORUM complexes is treated as if all subunits interact 
and scored as 1.0 so that they are retained in the net- 
works at any scoring threshold. The output of a search 
is an XML file, viewed using a customized applet for 
PPIs that appears in the GPS-Prot Navigator window 
(TouchGraph LLC, New York, NY). 

User upload of data (up to nine datasets) is permitted 
after creating an account at the GPS-Prot website. 
Uploaded data can be of two types: physical interactions 
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or genetic/functional interactions. Physical interactions 
should be formatted as a two-column list of interacting 
proteins (Uniprot or Entrez identifiers, tab delimited; e. 
g., .txt file from Microsoft Excel). Genetic/functional 
interactions should be formatted as a single column list 
of Uniprot or Entrez identifiers. At present, only HIV or 
human proteins can be uploaded. 

Analysis of overlapping complexes/functional modules 

Datasets were analyzed in terms of subunits of complexes 
or functional modules defined by CORUM [41]. Because 
CORUM includes subunits interacting with multiple 
complexes or subcomplexes, we created an all-against-all 
binary matrix of protein interactions to assign subunits 
to unique complexes or functional modules. This was 
necessary to assign one complex and its subunits to one 
intersection of the datasets. Hierarchical clustering was 
carried out on the matrix using Cluster 3.0 and a branch 
length threshold of 1.6 was used to select clusters from 
the dendrogram, which we defined as our set of com- 
plexes, after some manual refinement (see Additional file 
3: Corum_compl.xls). In total, the set consists of 222 
complexes, containing 1600 subunits (see Additional file 
3: Corum_compl.xls). Genes/proteins from the datasets 
were assigned to complexes/functional modules and the 
overlaps of complexes between the different datasets cal- 
culated. Statistical significance of the number of subunits 
overlapping was calculated using frequency observed in 
size-matched, randomly generated datasets. In addition, 
the significance of the number of subunits identified in 
each complex was calculated using the hypergeometric 
distribution function in Microsoft Excel, (see Additional 
files 4 and 5: RNAi_compl.xls and Prot_compl.xls). 

Identification and verification of Vif complexes 

Vif-binding proteins were identified by affinity tagging/ 
purification combined with mass spectrometry analysis 
(Jager et al., submitted). To investigate further the novel 
interaction with Huwel, we performed immunoprecipi- 
tations and Western blotting as follows: Plasmids that 
express Vif, Vpr, or Nef were constructed by inserting 
cDNA-derived genes into a pcDNA3 vector containing 
C-terminal tandem 2xStrep/3xFLAG tags, and 293 cells 
were transfected using calcium phosphate. Cells were 
harvested two days post-transfection and lysed and 
immunoprecipitated with anti-FLAG M2 affinity resin 
(Sigma) according to manufacturer instructions. Proteins 
eluted with 3xFLAG peptide were analyzed by Western 
blot using anti-Cul5, anti-UPFl and anti-Elongin B 
(TCEB2) (Santa Cruz), anti-FLAG (Sigma), or anti- 
Huwel (Bethyl Laboratories) antibodies. Western blots 
were developed using ECL Plus Western Blotting Detec- 
tion System (GE Healthcare). 



Results 

Generation of HIV-1 -human networks using GPS-Prot 

The GPS-Prot platform, found at http://www.gpsprot. 
org, allows users to initiate searches either by selecting 
an HIV protein from a graphic of the viral genome or 
by entering an HIV or human gene identifier in the 
search box (Figure 2A). A network is then generated 
and visualized (Figure 2B) using data from several pub- 
licly-available protein interaction databases, including 
VirusMINT [3] for HIV-host interactions, and HPRD 
[33], IntAct [34], MINT [43], BioGRID [36], DIP [37] 
and MIPS [38] for interactions between human proteins. 
There are also additional databases that can be selected. 

The GPS-Prot databases selected on the homepage 
can also be searched from within the Navigator window 
by double clicking any node. Thus, it is possible to 
visualize not only the HIV-host interactions but also to 
explore second-shell (or third-shell, etc.) host-host inter- 
actions in an intuitive manner. Figure 2B shows a net- 
work with all human binding partners to the HIV Vif 
protein. In this case, after the initial network of Vif bin- 
ders was built, the binding partners of CUL5, a factor 
hijacked by Vif [44], were added into the network by 
double clicking the CUL5 node (Figure 2B, right-most 
network). 

Two text panels are located to the left of the network 
window. The top panel toggles to display two types of 
information depending on what is selected in the net- 
work: details about any protein (node) or any interaction 
(edge) (e.g. panels headed "CUL5" and "Interactions", 
respectively) (Figure 2B). Single clicking any node or 
edge toggles between the windows and includes infor- 
mation about the originating database(s) for the PPI 
(protein-protein interaction), experiment type, links to 
publications, functional information, and Uniprot 
entries. 

Two tabs in the bottom left panel allow users to tog- 
gle between two tables that provide further details about 
the network. The "Protein" tab lists all proteins or 
nodes while the "Interactions" tab lists all interactions 
or edges. By default, a limited amount of information is 
included for each protein or interaction, which can be 
expanded to include additional parameters. For example, 
a useful "keywords" field can be added to the interac- 
tions table when using the NIAID HHPID database, and 
then interactions can be sorted by clicking on the col- 
umn headers. Groups of table entries can be selected (e. 
g. all having the same keyword), causing them to be 
highlighted in the network panel. The search box can be 
used to find any particular protein in the loaded 
network. 

We have assigned rough "confidence scores" to each 
pair-wise interaction based on the number of 



Fahey et al. BMC Bioinformatics 201 1, 12:298 
http://www.biomedcentral.eom/1 471 -21 05/1 2/298 



Page 5 of 1 3 




Figure 2 GPS-Prot: a web-based platform for visualizing diverse HIV-host data. (A) GPS-Prot homepage. Searches are initiated by selecting 
databases and an HIV or host protein. (B) A Touchgraph Navigator window is launched to display results of a search, which contains the protein 
interaction network. Single clicking any interaction ("edge", or gray line connecting proteins) provides the evidence from the literature for that 
interaction in the left-hand panel. Clicking on any protein in the diagram ("node") pulls up details for that protein (e.g. panel labeled CUL5). 
There is also a searchable table that can be sorted by score, database or experiment. A new network can be created by double clicking any 
protein (node), thus, it is possible to "walk through" the entire HIV-human or human-human interactome. 
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independent publications and experimental methods 
(see Implementation), similar in concept to the scoring 
used by the MINT database [43]. However, the scores 
used by GPS-Prot are not meant to evaluate the validity 
of interactions in any absolute way, but rather to allow 
users to dynamically change the number of viewed 
nodes by adjusting a confidence score slider in the net- 
work panel (Figure 2B), thereby acting as a filter to help 
visualize large networks with many nodes. The edge line 
widths in the network panel are also displayed in pro- 
portion to their scores and future quantitative informa- 
tion about HIV-human interactions can be incorporated 
later. For example, we have devised the MiST (mass 
spectrometry interaction statistics) score to quantita- 
tively report on interactions derived from systematic 
AP-MS studies (Jager et al, submitted) and these values 
can be effectively incorporated into GPS-Prot. 

The Navigator window also includes other features to 
help simplify visualization, such as zoom and spacing 
sliders (Figure 2B) and the ability to resize the informa- 
tion and network panels by dragging borders. Network 
images can be exported using a "Save Image" option 
under the File pulldown menu. Data can also be 
exported in the form of a tab-delimited file by using the 
"Export network" link in the Navigator window. 

Overlay of physical and functional interaction networks 

One challenge in handling large-scale genomic datasets is 
the difficulty in integrating different data types, a task 
accomplished in GPS-Prot by allowing users to view data 
from functional screens in the context of PPI networks. 
By default, GPS-Prot includes seven genetic and proteo- 
mic profiling screens carried out in the context of HIV-1 
infection [14-17,26-28], which are overlaid on the physi- 
cal binding networks (Figure 2). Operationally, the physi- 
cal interaction network is first built from the PPI 
databases (green nodes) and then interactors identified 
by the genetic or proteomic screens are highlighted in 
yellow, with links to publications in the information 
panel. Including functional data in a GPS-Prot search can 
highlight relevant clusters in a network. For example, the 
well-established complex of Vif with TCEB1 (Elongin C), 
TCEB2 (Elongin B) (which forms a larger complex with 
the Ring Box protein RBX1, and CUL5) [44], is easily 
noted in Figure 2B, as the Elongin subunits are high- 
lighted in yellow based on RNAi and proteomic profiling 
screens. The importance of this complex during the HIV 
life cycle is well appreciated, as Vif targets APOBEC3G 
for degradation during the course of infection [44] . 

Use of CORUM to identify complexes involved in HIV 
function 

Another important feature of GPS-Prot is the ability to 
group subunits of complexes together by including data 



from the CORUM database [41], a collection of manu- 
ally curated mammalian protein complexes. To date, 
there are several examples of HIV proteins interacting 
with well-characterized human complexes. For example, 
Tat interacts with CCNT1/CDK9, components of the 
elongation factor pTEFb, along with the chromatin reg- 
ulators, AFF4, ENL, ELL, and AF9 [45,46], a complex 
important for transcriptional activation, and as pre- 
viously mentioned, Vif hijacks a multi-subunit ubiquitin 
ligase complex containing Cul5, thus targeting APO- 
BEC3G to the proteasome for degradation [44]. Analyz- 
ing and visualizing datasets in terms of complexes can 
increase agreement between different functional screens, 
which often have little overlap at the individual gene or 
protein level (Figure 1; [29]). 

We used the CORUM database to identify statistically 
significant overlaps between genetic and proteomic 
screens. Initially, we found that the four HIV RNAi 
screens [14-17] are enriched for proteins that are part of 
protein complexes (Figure 3A), as annotated by 
CORUM. This trend was also observed for other small 
viruses for which RNAi data is available (Figure 3A), 
including hepatitis C [18,19] and influenza [20,22,23]. 
To see how these trends compared to genetic data 
derived from a bacterial pathogen, we analyzed a recent 
RNAi screen that assessed effects of Mycobacterium 
tuberculosis (Mtb) infection [47]. In this case we found 
no strong enrichment for subunits of protein complexes 
within the dataset (Figure 3A, p = 0.05). This was not 
due to an abundance of weakly expressing genes in the 
Mtb screen that could cause under-representation in the 
CORUM database (Additional file 6; Figure Sl.doc). The 
observation that HIV and other viruses appear to target 
larger molecular machines compared to Mtb is consis- 
tent with the hypothesis that its significantly smaller 
genome (15 proteins vs. -4000 in Mtb) requires that it 
needs to physically hijack a greater proportion of the 
host machinery. 

Our analysis also shows that HIV-1 RNAi datasets 
have a greater intersection when they are analyzed in 
terms of multi-subunit complexes rather than as indivi- 
dual factors. The tables in Figure 4 show the number of 
subunits from the same complex identified in the RNAi 
(Figure 4A) and proteomic screens (Figure 4B). For 
example, both the spliceosome and proteasome were 
identified in all four genetic screens and included 34 
subunits (p = 4.0 x 10" 4 ) of these two complexes (20 
and 14 subunits, respectively) (p = 2.9 x 10" 6 , p = 4.8 x 
10" 9 respectively) (Additional file 4:RN Ai_compl.xls). In 
all, 48 proteins (p = 1.7 x 10" 4 ) belonging to eight sepa- 
rate complexes and 40 proteins (p = 2.5 x 10" 3 ) belong- 
ing to 17 separate complexes were identified in three 
and two screens, respectively (Additional file 4: RNAi_- 
compl.xls). Collectively, there were 1014 proteins 
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Figure 3 Viral RNAi screens are enriched for host factors that are subunits of human complexes. (A) All viral RNAi screens identify 
significantly more human complex subunits identified than expected (HIV 23%, influenza 25%, and hepatitis C 24%), compared to the number of 
proteins in the human genome assigned to complexes by CORUM (12%). P values shown are based on the hypergeometric distribution. We find 
no strong enrichment of protein complexes in a screen of Mtb host factors (13%). (B) Network of Vif interactors from GPS-Prot using the 
optional NIAID HIV-1 -human interactions database, instead of VirusMINT. Including CORUM as a database brings complex subunits closer 
together in the network, for example the cluster of proteasome complex subunits shown to the lower left (e.g. PSMA, PSMB, PSMC, etc). 



identified in all four RNAi screens, of which 122 are 
found in at least two screens when analyzed in the con- 
text of a protein complex (p < 10~ 5 ). 

A similar concordance is found in the proteomic profil- 
ing datasets when analyzed in the context of protein 
complexes (Figure 4B, Additional file 5:Prot_compl.xls). 
In total, 120 complexes are implicated in HIV function 
by all seven datasets (Additional files 4 and 5: RNAi_- 
complxls and Prot_compl.xls). Some complexes were 
identified by both technologies, including the proteasome 
(Figure 4A and 4B), while others were only significantly 
enriched in one, such as ESCRT III in the proteomic pro- 
filing screens. Overall, 38 complexes are identified by 
both genetic and proteomic profiling, 48 by genetic 
screening alone, and 34 by proteomic profiling alone. 

To confirm this analysis, we sought to verify one of 
these identified complexes experimentally. This was 
accomplished by knockdown of a set of mediator subu- 
nits that were not identified in any screen as host fac- 
tors (gray subunits in Figure 4). We found that RNAi 
targeted to one of these, MED30, strongly inhibited 
early-stage HIV replication without inducing toxicity 
(Additional file 7; Figure S2.doc). MED30 is contained 
within the head module of Mediator, one of four func- 
tionally distinct sub-complexes [48], and is required for 
promoter recognition [49] and assembly/stabilization of 
transcription pre-initiation complexes [50,51]. Interest- 
ingly, RNAi knockdown of 8 out of 11 (p = 0.007) head 
module factors (including MED30) affect replication 
while no protein in the Cdk8 module was identified in 
any of the RNAi screens (see Additional file 4: RNAi_- 
compl.xls). 



Based on this analysis, we conclude that analyzing the 
genetic data in the context of complexes is useful for 
identifying statistically significant factors affecting HIV 
function. Allowing users to optionally select CORUM in 
GPS-Prot permits a similar analysis, albeit at a visual 
level, by highlighting complexes with different subunits 
that have been identified in different screens. We have 
found that including data from the CORUM database 
can increase the visual overlap between different genetic 
and proteomic screens and allow users to disentangle 
biochemical complexes from broader biological pro- 
cesses. Figure 3B shows the visual advantage of includ- 
ing CORUM in a search; in this case, using it in 
conjunction with the NIAID HIV-1 -human interactions 
database. GPS-Prot presumes an edge between all mem- 
bers of a complex, bringing members in the network 
into a very dense cluster of nodes. As shown in Figure 
4, different subunits of the proteasome are identified in 
all seven HIV functional screens. The proteasome is 
much more clearly identified as a complex, in GPS-Prot 
when CORUM data is included. 

The approach of combining information from differ- 
ent screens, particularly those utilizing different technol- 
ogies, is effective, in part, because many screens do not 
reach saturation. There can also be a high false negative 
rate (e.g. known binders of HIV proteins, such as Cyclin 
Tl, are not found in some screens) or false positive rate, 
due to off target effects and variable expression of host 
factors in different cell lines. Analyses in the context of 
complexes compensates to some extent for these limita- 
tions by identifying overlaps between datasets, especially 
when saturation is not reached. 
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Figure 4 Five complexes implicated in HIV pathogenesis by analysis with CORUM. (A) Network analysis of RNAi datasets. Gray nodes are 
subunits present in the complex according to the CORUM database. Colored subunits (nodes) were reported in one or more of the genetic 
screens. Based on the hypergeometric distribution, we find significantly more subunits of the proteasome (p = 4.2 x 10" 9 ), Mediator (p = 1.1 x 
10" 9 ), and the exosome (p = 2.1 x 10~ 3 ) than expected. Subunits of ESCRT III and CCT complexes are not significantly enriched. The table shows 
the number of complexes and subunits identified by two, three or four RNAi screens. As with genetic screens, there is greater overlap between 
datasets when analyzed in terms of subunits of complexes as opposed to isolated proteins. (B) Network analysis of proteomic profiling datasets. 
The same complexes are shown as in panel A, with subunits highlighted as they occur in different datasets. Mediator and exosome complexes 
are not covered more than expected, but significantly more subunits than expected are found for ESCRT III (p = 8.4 x 10~ 3 ) and CCT complexes 
(p = 2.0 x 10~ 7 ). The proteasome is the only complex where more subunits than expected are identified by both genetic and proteomic 
profiling screens (p = 7.0 x 10~ 23 ). 
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Upload of user-generated data 

According to the HHPID database, numerous host fac- 
tors (up to several hundred) may interact with any given 
HIV-1 protein. In addition, RNAi screens alone have 
added more than 800 unique host factors to the current 
datasets. The continuing issue when obtaining new data- 
sets is to distinguish between relevant hits and noise, 
which can be aided, as we have shown, by combining 
multiple datasets and/or analyzing the data in the con- 
text of protein complexes. To address this need, GPS- 
Prot allows users to create an account and upload up to 
nine in-house datasets to be included in the interaction 
networks. The set can describe physical interactions, 
consisting of a list of binary interacting proteins, or sim- 
ply a list of genes/proteins such as that generated by 
RNAi or proteomic profiling screens (see Implementa- 
tion for details). 

We used this feature to analyze a partial dataset from 
our ongoing project to determine a comprehensive 
human-HIV-1 interaction map using AP-MS [52] (Jager 
et al., submitted). We obtained preliminary interaction 
data for Vif by transiently expressing and purifying a C- 
terminally 3xFLAG tagged version from HEK293 cells 
and analyzed the associated proteins by mass spectro- 
metry. We then uploaded these data into GPS-Prot, to 
view in the context of previously reported Vif binders 
(Figure 5A; uploaded data are marked with red tags). 
The most well-characterized Vif partners, TCEB1 (Elon- 
gin C), TCEB2 (Elongin B), and CUL5 (circled in red 
and highlighted in the lower left table), were present in 
the AP-MS dataset and two of these (TCEB1 and 
TCEB2) were also found in RNAi and/or proteomic 
screens (yellow nodes). Interestingly, of the four remain- 
ing proteins observed both by AP-MS and in the screens 
(yellow and red-tagged), three of these, PSME3 (a pro- 
teasome subunit), HUWE1 (an E3 ligase), and UBL4A (a 
ubiquitin-like protein), have functions that may relate to 
the role of Vif in ubiquitin-tagging substrates for protea- 
somal degradation. Because Huwel acts during the late 
stages of HIV infection [14] when Vif is believed to 
function, we retested the Vif-Huwel interaction by 
immunoprecipitation (IP)-Western blotting using an 
antibody against Huwel and indeed observed strong and 
specific binding (Figure 5B). It will be of great interest 
to determine whether Vif itself is targeted for ubiquiti- 
nation by Huwel or whether Huwel might be a second 
ubiquitin ligase recruited by Vif to tag APOBEC3G or 
other as-yet-unidentified targets for degradation. 

Comparison with other platforms 

There are a number of tools for visually exploring biolo- 
gical networks, such as PINA [53], STRING [54], Cytos- 
cape [55], and others (reviewed in [56]). Some 
standalone databases are also integrated with viewers, 



such as the MINT database [57]. Others are linked to 
external viewers such as Osprey [58] for BioGRID data- 
base interactions or the Cytoscape plugin MiSink for 
DIP interactions [59]. Alternatively, sites like STRING 
and APID/APID2NET have plug-ins for Cytoscape [60] 
and integrate interactome data from multiple PPI 
databases. 

Many of the existing network analysis platforms, how- 
ever, do not include HIV-host interactions, or virus-host 
interactions in general, and also require varying degrees 
of expert knowledge to produce and navigate networks. 
Thus, there is a need to integrate and synthesize the 
abundant HIV-host physical and genetic interaction 
information (or more generally host-pathogen informa- 
tion) from public repositories. PIG [61] and VirusMINT 
[3] have taken steps in this direction by creating data- 
bases that contain a substantial number of physical HIV 
interactions, along with other physical virus-host inter- 
actions. CAPIH is a tool that provides a web interface 
for accessing physical host-HIV interactions [62] in the 
context of comparative genome analysis and provides 
information about the differences in sequences between 
interacting proteins of model organisms (chimpanzee, 
rhesus macaque, and mouse). Also, a web version of 
JNets [63] allows users to view a global network repre- 
sentation of the HHPID HIV-host interactions and 
explore that network using the underlying annotations, 
such as Gene Ontology (GO) annotation or HHPID 
keywords. 

Aside from the issue of integrating physical and 
genetic virus-host data, it has been noted that some bio- 
logical network tools utilize generic graph drawing tools 
that are not necessarily intuitive to most biologists [56]. 
We took an alternative approach of harnessing a com- 
mercial viewer (TouchGraph Navigator), which has been 
developed for non-scientific applications including social 
network analysis, and modifying it in collaboration with 
its designers for our scientific application. 

GPS-Prot also allows users to include information 
about complexes through inclusion of data from the 
CORUM database. Our results suggest this approach 
may be particularly suited to viruses or other pathogens 
that rely extensively on multi-subunit host machinery, 
as indicated by our preliminary comparison with the 
bacterial pathogen Mtb. However the vast majority of 
data available are from viral pathogens and more studies 
of microbe pathogens are required to definitively tease 
apart the differences. 

Conclusions 

As high-throughput technologies identify more host fac- 
tors that physically associate with viral factors, it is vital 
to integrate this information with other, diverse types of 
data, such as genetic and proteomic profiling, and to 
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Figure 5 User-generated data can be uploaded and viewed in the context of complete PPI networks from public databases. (A) Vif 

network from GPS-Prot, including an uploaded dataset from AP-MS experiments (red-tagged nodes). Huwel is among several proteins in the 
uploaded dataset (Jager et al., submitted) that are not found in other databases (e.g., not present in Figure 2B), and were also previously 
identified by genetic/proteomic screens. (B) HIV Vif interacts with endogenous HUWE1 in 293 cells. 3xFLAG-tagged Vif, Vpr, and Nef were 
immunoprecipitated with anti-FLAG agarose beads. Lysates (L), remaining supernatant (S) and eluates (E) were analyzed by SDS-PAGE and 
Western blotting with antibodies as indicated. The same band is identified in the Vif pulldown by antibodies against the known CUL5 E3 ligase 
complex, anti-CUL5 (not shown) and anti-ELOB (TCEB2) as well as anti-Huwel antibodies, but not by the control anti-UPFI antibody. 



provide tools to visualize them in intuitive ways. GPS- 
Prot provides such a tool by aggregating several major 
databases for physical virus-host and host-host PPIs and 
overlaying HIV-1 genetic/proteomic profiling data, in 
addition to allowing upload of new user-generated data. 

A next goal is to extend the GPS-Prot infrastructure to 
other pathogens, particularly viruses. Currently very few 



have datasets as large as HIV-1, particularly with regard 
to the physical interactome of each viral protein. We 
have collected physical interaction datasets derived from 
AP-MS studies for HIV-1 in HEK293 and Jurkat cells 
that will be included in the GPS-Prot set of databases 
(Jager et al, submitted). Finally, we also intend to expand 
these analyses to other pathogens in the near future. 
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Availability and Requirements 

GPS-Prot is freely available to all users with Java- 
enabled web browsers (best viewed with Safari and Fire- 
fox) at http://www.gpsprot.org. GPS-Prot was coded 
using XHTML, CSS, PHP, XML, Java, MySQL and 
j Query. 

Additional material 
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PPI: Protein-protein interaction 
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